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Abstract 

We present a computer simulation, which is inspired by Penna model, to help un- 
derstanding the effect of genetic coding tables on population dynamics. To represent 
populations we used real and artificial gene sequences in this model. We coded these 
sequences using different amino acid tables in Nature, the standard table as well as 
the tables which are used by mithocondria and some eukaryotes. Contrary to com- 
mon belief we find that the standard code table which is used in most organisms in 
Nature, does not give the most resilient coding against point mutations. 



1 INTRODUCTION 



Modeling population dynamics has been popular in physics community for 
a while mostly because complexity of the system bears the necessity of the 
statistical and computational tools. Physicists brought the models which can 
give rise to computational simulations together with looking for the simplest 
solution approach into this subject. Among all others [T1I2I3P] . Penna model 
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[5] is the most extensive simulation scheme used in population dynamics. Sim- 
ulations for population dynamics usually take many different factors of live 
into account. In real life it is very difficult to have only one of the aspects of 
life count where we neglect the effects of everything else. For example in real 
life you cannot say that the only cause for death is point mutations because 
then you have to keep the individuals in the system from dying of "old age" or 
of malnutrition or of fighting amongst the members. In simulations it is much 
easier to ignore all these facts of life and concentrate only on one simple aspect. 
In this work we have neglected all the other aspects of life and concentrated 
only on the effect of point mutations on survivability of the individuals. 

Genetic information of all living organism (except some viruses) is stored in 
DNA. The segment of DNA which contains necessary information to produce a 
specific protein is called gene. A real gene is composed of two different parts: a 
coding portion and a non-coding portion. The coding part, exon, is responsible 
for protein synthesis whereas the rest, intron, does not code a protein and the 
purpose of this part is not clearly understood yet. 

The information in DNA is coded by using four different types of monomers 
adenine (A), guanine (G), cytosine (C) and thymine(T). These monomers are 
the letters of the genetic alphabet and they construct 3-letters long words, 
codons. Every codon on DNA codes an amino-acid during the protein syn- 
thesis. There are 4'^, 64 possible combination of codons available on DNA. 
However; in Nature there are only 20 amino acids available for protein coding 
and as a result there is no one-to-one codon-amino acid correspondence. The 
table which determines how the codons are mapped into the amino acids is 
called amino acid table or genetic coding table. 

For a long time it was believed that the amino acid table of Nature was 
universal, Standart Genetic Code (SGC). However; we now know that there are 
few exceptions in codon usage which are confined to mitochondria and certain 
protozoa [6]. In these particular tables the same number of amino acids is used, 
but some of these amino acids are coded by different codons. For example, in 
SGC , the codon "AUA" codes Isoleucine, the codon "UGA" codes Stop, 
and the codons "AGA" and "AGG" code Arginine . However, in the table of 
Vertebrate Mitochondrial Code, they code Methionine, Tryptophan, and Stop 
respectively [7]. 

Recently, we have developed a Monte Carlo simulation model ^ inspired by 
the Penna model to investigate the significance of the number of amino acids 
in population dynamics. In that model, each individual was represented by a 
human cytokine gene sequence and mutation was assumed to be the only cause 
of death by eliminating all other effects. In that study, it has been shown that 
for maximum tolerance against mutations, the number of amino acids which 
codes the genetic information, is bounded between 20 to 24. The number of 
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amino acids used in Nature, 21 (20 amino acids plus the Stop codon), is in 
this optimum range. 

In this paper, we used the same model to investigate the endurance of different 
amino acid tables against point mutations neglecting any other causes of death. 



2 COMPUTATIONAL METHOD 



In our model, an individual was represented by three different gene sequences. 
First, we have used a real gene from Nature human cytokine (LD78 Homo 
sapiens blood lymphocyte gene on the DNA l?*'^ chromosome) [S] same as in 
the previous model. This gene is playing an important role in immune system 
of human body, hence any problem in generating this gene is lethal. Afterwards 
we used the same model on different mammalian genes, in this paper we also 
report the results for human ARNT gene (Homo sapiens aryl hydrocarbon 
receptor nuclear translocator, transcript variant 1, mRNA) as an example. 

Next, we have created an artificial human gene, the average human gene, 
which basically reflects the codon usage frequency found in Homo sapiens. This 
gene consists of 1000 nucleotides (frequency is taken as an integer value per 
one thousand nucleotides given in |TU]) in a randomly chosen sequence. This 
artificial gene is considered to represent the whole human genome. This assures 
that the results we have obtained are not specific to the human cytokine gene 
but for whole genome. 

For the sake of simplicity, we have not included reproduction and we also 
have neglected all other effects causing death, except mutation. Using this 
model, we have investigated the effects of mutations on the population size. A 
mutation in this model was taken as a change of one nucleotide in the gene. 
It can be either lethal or harmless depending on whether it causes a change 
in the amino acid chain or not. We kept all mutation rates equal like in the 
Jukes-Cantor mutation scheme |Tlj . 

When a mutation takes place on the exon part, there are two possibilities: The 
changed codon either will code the same amino acid or it will code a different 
amino acid [Fig. [1] since an amino acid can be coded by more than one codon. 
If the mutant codon still codes for the same amino acid, this mutation is taken 
to be harmless because at the end it will not affect the synthesized protein. 
However, if the mutant codon codes for a different amino acid, the protein 
cannot be produced and the individual would simply die. 

To be more explicit, the codons AAA and AAG code the same amino acid. 
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Fig. 1. A mutation on exon part may be either harmless or lethal depends on whether 
it change the coded amino acid. 



"lysine" ; hence if AAA turns into AAG as a result of a mutation the amino acid 
will not change and the protein can be constructed safely. However; if AAA 
turns into AGA, which codes the amino acid "arginine", the amino acid chain 
will change and we assume that the protein can not build up, which means the 
represented organism will die. There can be a mutation which converts AAA 
to AAX where X 7^ A, G, C, or T; then the individual dies automatically. As 
a model, we are looking at a simpler case where a mutation changes A to one 
of G, C, or T, but not X. 



Since reproduction is not included in the model, the population can only di- 
minish. The decrease in population can be found by calculating the probability 
of a deleterious mutation. The probability of the mutation changing the amino 
acid depends on the codon; so one needs to find the probability of hitting each 
different codon type. First, the probability of hitting a codon type (Pq) is 
calculated as the ratio of the number of codons of that type in the gene (iVa) 
to total number of codons. Then we need to exclude the mutations that do 
not cause a change in the amino acid and calculate the probability of a change 
occurring in the amino acid caused by a change in one nucleotide {P{d/a)). 

We used only the exon (protein coding) part of the gene considering any 
mutations on the intron part is harmless. As a simple example, the human 
cytokine gene has a total length of 2068 nucleotides; 621 nucleotides in exon 
part and 1447 ones in intron. The probability of hitting the exon part of the 
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gene is simply the ratio of the exon part to the total gene: 



621 

Pihitting exon) = = 0.3032 (1) 

^ ^ ' 2068 ^ ' 



Hence; the probability of having a deleterious mutation for all of the gene 
is simply a product of mutation probability and probability of hitting the 
exon part of gene. As the chances of hitting any part of the gene is a same, 
we can neglect the intron part in the simulation since this would only be a 
multiplicative constant in the problem. Therefore the probability of having a 
deleterious mutation for all of the human cytokine gene is simply: 

64 

P {deleter imis) oc 5I[^ai^(t^/a)] = 0.7729 (2) 

a=l 



The survival probability can be calculated by: 

P{surviving) = 1 — P (deleterious) = 0.2271 (3) 



If we take an initial population of Nq genes (individuals), after n number of 
mutations, to the first order, the number of surviving individuals (A^„) is given 
by: 

Nn ^ NqP (surviving)'" (4) 



Hence, we obtain the "probability of survival" with the slope of the number 
of surviving individuals versus time graph: 

slope ~ ln[P (surviving)] — —1.4823 (5) 



Similarly the probability of survival can be calculated for all the genes sepa- 
rately. However in this calculation once we make a change in the gene sequence 
and if the individual survives, we forget about the change we have made and 
restart the process for the second mutation cycle with the original gene se- 
quence. In Nature, if the individual survives, the second mutation cycle starts 
with the mutated gene sequence and not the original one. Therefore, to be 
able to get closer to Nature we have also written a simulation code which 
allows for the mutation in the gene sequence to be kept in the next mutation 
mutation cycle. 
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3 SIMULATION 



In this simulation, the population consists of individuals which are described 
by only one gene. Genes are represented by arrays which contain 0, 1, 2, and 
3's instead of the nucleotides Adenine (A), Guanine (G), Cytosine (C) and 
Thymine (Uracil (U)) respectively. A sign bit which shows if the gene has a 
deleterious mutation (1) or not (0) is also included in the array [8]. 

In every time step, all of the individuals undergo a random mutation. If the 
mutation is deleterious, i. e. if it changes the amino acid, the sign bit is changed 
to '1', the individual is deleted from the population (death) and the time step 
is recorded. Otherwise, the sign bit is kept '0' and the individual survives. 

When the mutation cycle is finished, the number of surviving individuals in 
each time step is calculated. Since the probability of mutation is independent 
of the number of individuals, surviving individuals also give us the popula- 
tion size. Hence, we have an exponential population decay and the exponent 
depends on the probability of surviving {P (surviving)). Logarithm of the 
population is fitted to a straight line and the slope of the line is calculated. 
All simulations are run for 10 times and probability of surviving is calculated 
according to the weighted average of these 10 runs. 



4 RESULTS AND DISCUSSION 



In this work, we have investigated the following amino acid tables in addition 
to SGC: 

• Alternative Yeast Nuclear Code (AYNC), 

• Ascidian Mitochondrial Code (AMC), 

• Blepharisma Nuclear Code (BNC), 

• Ciliate, Dasycladacean and Hexamita Nuclear Code (CDHNC), 

• Echinoderm Mitochondrial Code (EMC), 

• Euplotid Nuclear Code (ENC), 

• Flatworm Mitochondrial Code (FMC), 

• Invertebrate Mitochondrial Code (IMC), 

• Mold, Protozoan, and Coelenterate Mitochondrial Code and Mycoplasma/ Spiroplasma 
Code (MSG), 

• Vertebrate Mitochondrial Code (VMC), and 

• Yeast Mitochondrial Code (YMC). 
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The results of our simulations, using two different human gene sequences, are 
given in Table [1] and Table [2l We have selected these two genes as represen- 
tatives, however, the same results have been obtained in many other human 
genes simulated. We have also used some genes from Mus Musculus (com- 
mon house mouse) and Rattus (rat) where we have obtained similar results. 
As these detailed results are more appropriate for an evolutionary biology 
journal, we report only the representative results. 

Table 1 

Average slopes of the population decrease comparing different genetic code tables 
with human cytokine gene. Larger the magnitude of the slope is, the more the 
survival chance. 



Code 


Simulation 


Calculation 


FMC 


-1.4056 ±0.0005 


1.4053 


EMC 


-1.4164 ±0.0005 


1.4163 


IMC 


-1.4320 ±0.0001 


1.4319 


ENC 


-1.4415 ±0.0005 


1.4409 


BNC 


-1.4691 ±0.0003 


1.4685 


MSG 


-1.4759 ±0.0003 


1.4755 


CDHNC 


-1.4784 ±0.0005 


1.4779 


AMC 


-1.4784 ±0.0005 


1.4779 


SGC 


-1.4830 ±0.0005 


1.4826 


VMC 


-1.5020 ±0.0001 


1.5017 


YMC 


-1.5677 ±0.0007 


1.5638 


AYNC 


-1.5800 ±0.0001 


1.5793 



If it is assumed that the genetic code table is optimized for increasing chance 
of survival against mutations, then the results of Table [1] and Table [2] cause 
many concerns. Even though there are few code tables giving results which 
favor the usage of SGC (like VMC, AYNC, and YMC), we can see that if we 
use a different code table, for example FMC, our white blood cell production 
would be more resilient towards mutations. 

To be certain that the results obtained in Table [Hand Table [2] do not depend 
on particular genes, we have first created an artificial average human gene 
and we run simulations using this average human gene as representetive of 
individuals. Table [3] shows the results from these simulations. 

In Table |3l FMC still performs much better and the worst performance is 
still by AYNC followed by YMC. The human average gene gives comparable 
results to the genes chosen in this work, indicating that the advantage of SGC 
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Table 2 

Average slopes of the population decrease comparing different genetic code tables 
with human ARNT gene. Larger the magnitude of the slope is, the more the survival 
chance. 



Code 


Simulation 


Calculation 


FMC 


-1.4038 ± 0.0004 


1.4036 


EMC 


-1.4101 ± 0.0007 


1.4099 


CDHNC 


-1.4268 ±0.0004 


1.4268 


IMC 


-1.4288 ±0.0004 


1.4285 


BNC 


-1.4368 ±0.0004 


1.4368 


ENC 


-1.4566 ± 0.0005 


1.4565 


AMC 


-1.4586 ± 0.0005 


1.4583 


MSC 


-1.4609 ± 0.0006 


1.4607 


SGC 


-1.4653 ± 0.0005 


1.4650 


VMC 


-1.4882 ±0.0008 


1.4878 


AYNC 


-1.5215 ±0.0005 


1.5213 


YMC 


-1.5295 ±0.0005 


1.5271 



cannot be simply explained on the basis of singular mutations without taking 
higher order effects like protein folding into account. 



5 CONCLUSION 



In this paper, we used a computer simulation to represent a living organ- 
ism under mutations. Furthermore, we changed the genetic code used in the 
simulations to analyze its effect on population stability. 

We have used different code tables utilized in the Nature in connection with 
two sample genes, human cytokine gene and human ARNT gene . Since these 
genes are one of the key factors for our bodies, we assumed that if the organism 
fails to produce any of the proteins coded by these genes, it will not be able 
to survive. 

The simulations show that SGC, which is being used in most of the vertebrates, 
actually does not give the most resilient organisms against point mutations if 
we only apply simple simulation rules. 
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Table 3 

Average slopes of the population decrease using different genetic code tables and 
the average human gene. Larger the magnitude of the slope is, the more the survival 
chance. 



Code 


Simulation 


Calculation 


FMC 


-1.4081 ± 0.0005 


1.4083 


EMC 


-1.4204 ± 0.0005 


1.4207 


IMC 


-1.4438 ±0.0003 


1.4439 


CDHNC 


-1.4498 ±0.0001 


1.4501 


BNC 


-1.4554 ±0.0006 


1.4558 


ENC 


-1.4595 ± 0.0005 


1.4596 


MSC 


-1.4655 ± 0.0005 


1.4658 


AMC 


-1.4693 ± 0.0006 


1.4697 


SGC 


-1.4712 ± 0.0004 


1.4716 


VMC 


-1.4969 ±0.0005 


1.4971 


YMC 


-1.5540 ±0.0006 


1.5543 


AYNC 


-1.5803 ±0.0001 


1.5804 



To test these results, we have also created an artificial average human gene. 
If we compare our results as SGC versus the rest of the coding schemes, 
the results do not change, i. e., SGC is still not the best solution. However, 
when we compare other code tables within themselves we see that some code 
tables give better results with different human genes whereas others give better 
results with the artificial average human gene. This result is most pronounced 
with CDHNC, which survives much better if we use the average human gene 
than the human cytokine gene. 
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